Goto

Collaborating Authors

 Canadian County


STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Gu, Jiatao, Chen, Tianrong, Berthelot, David, Zheng, Huangjie, Wang, Yuyang, Zhang, Ruixiang, Dinh, Laurent, Bautista, Miguel Angel, Susskind, Josh, Zhai, Shuangfei

arXiv.org Artificial Intelligence

We present STARFlow, a scalable generative model based on normalizing flows that achieves strong performance in high-resolution image synthesis. The core of STARFlow is Transformer Autoregressive Flow (TARFlow), which combines the expressive power of normalizing flows with the structured modeling capabilities of Autoregressive Transformers. We first establish the theoretical universality of TARFlow for modeling continuous distributions. Building on this foundation, we introduce several key architectural and algorithmic innovations to significantly enhance scalability: (1) a deep-shallow design, wherein a deep Transformer block captures most of the model representational capacity, complemented by a few shallow Transformer blocks that are computationally efficient yet substantially beneficial; (2) modeling in the latent space of pretrained autoencoders, which proves more effective than direct pixel-level modeling; and (3) a novel guidance algorithm that significantly boosts sample quality. Crucially, our model remains an end-to-end normalizing flow, enabling exact maximum likelihood training in continuous spaces without discretization. STARFlow achieves competitive performance in both class-conditional and text-conditional image generation tasks, approaching state-of-the-art diffusion models in sample quality. To our knowledge, this work is the first successful demonstration of normalizing flows operating effectively at this scale and resolution.


Design and evaluation of AI copilots -- case studies of retail copilot templates

Furmakiewicz, Michal, Liu, Chang, Taylor, Angus, Venger, Ilya

arXiv.org Artificial Intelligence

Building a successful AI copilot requires a systematic approach. This paper is divided into two sections, covering the design and evaluation of a copilot respectively. A case study of developing copilot templates for the retail domain by Microsoft is used to illustrate the role and importance of each aspect. The first section explores the key technical components of a copilot's architecture, including the LLM, plugins for knowledge retrieval and actions, orchestration, system prompts, and responsible AI guardrails. The second section discusses testing and evaluation as a principled way to promote desired outcomes and manage unintended consequences when using AI in a business context. We discuss how to measure and improve its quality and safety, through the lens of an end-to-end human-AI decision loop framework. By providing insights into the anatomy of a copilot and the critical aspects of testing and evaluation, this paper provides concrete evidence of how good design and evaluation practices are essential for building effective, human-centered AI assistants.


A Machine Learning Data Fusion Model for Soil Moisture Retrieval

Batchu, Vishal, Nearing, Grey, Gulshan, Varun

arXiv.org Artificial Intelligence

Soil moisture is one of the primary hydrological state (memory) variables in terrestrial systems (Dobriyal et al. 2012; Rossato et al. 2017a), and is one of the primary controls for agriculture and water management (Dobriyal et al. 2012; Rossato et al. 2017b). Soil moisture affects evapotranspiration and vegetation water availability, which are at the core of the climate-carbon cycle (Falloon et al. 2011) and play an important role in hydrological risks such as floods, drought, erosion, and landslides (Kim et al. 2019; Legates et al. 2011; Tramblay et al. 2012). Accurate measurement of soil moisture has numerous downstream benefits (Moran et al. 2015) including reduced water wastage by better understanding and managing the consumption of water (Brocca et al. 2018; Foster, Mieno, and Brozović 2020), utilising smarter irrigation methods (Kumar et al. 2014) and effective canal water management (Zafar, Prathapar, and Bastiaanssen 2021). The most accurate way to measure soil moisture is via ground-based methods such as direct gravimetric measurements (Klute 1986) or indirect methods such as dielectric reflectometry, capacitance charge, etc. (Bittelli 2011), which in-situ sensors utilize (Walker, Willgoose, and Kalma 2004). However, in-situ sensors are difficult to scale spatially, and are expensive to install and maintain.


A Customizable Dynamic Scenario Modeling and Data Generation Platform for Autonomous Driving

Shenoy, Jay, Kim, Edward, Yue, Xiangyu, Park, Taesung, Fremont, Daniel, Sangiovanni-Vincentelli, Alberto, Seshia, Sanjit

arXiv.org Artificial Intelligence

Safely interacting with humans is a significant challenge for autonomous driving. The performance of this interaction depends on machine learning-based modules of an autopilot, such as perception, behavior prediction, and planning. These modules require training datasets with high-quality labels and a diverse range of realistic dynamic behaviors. Consequently, training such modules to handle rare scenarios is difficult because they are, by definition, rarely represented in real-world datasets. Hence, there is a practical need to augment datasets with synthetic data covering these rare scenarios. In this paper, we present a platform to model dynamic and interactive scenarios, generate the scenarios in simulation with different modalities of labeled sensor data, and collect this information for data augmentation. To our knowledge, this is the first integrated platform for these tasks specialized to the autonomous driving domain.


Deliberative Acting, Online Planning and Learning with Hierarchical Operational Models

Patra, Sunandita, Mason, James, Ghallab, Malik, Nau, Dana, Traverso, Paolo

arXiv.org Artificial Intelligence

The most common representation formalisms for automated planning are descriptive models that abstractly describe what the actions do and are tailored for effciently computing the next state(s) in a state-transition system. However, real-world acting requires operational models that describe how to do things, with rich control structures for closed-loop online decision-making in a dynamic environment. To use a different action model for planning than the one used for acting causes problems with combining acting and planning, in particular for the development and consistency verification of the different models. As an alternative, we define and implement an integrated acting-and-planning system in which both planning and acting use the same operational models, which are written in a general-purpose hierarchical task-oriented language offering rich control structures. The acting component, called Reactive Acting Engine (RAE), is inspired by the well-known PRS system, except that instead of being purely reactive, it can get advice from the planner. Our planner uses a UCT-like Monte Carlo Tree Search procedure, called UPOM (UCT Procedure for Operational Models), whose rollouts are simulations of the actor's operational models. We also present learning strategies for use with RAE and UPOM that acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve the acting efficiency and robustness of RAE. We discuss the asymptotic convergence of UPOM by mapping its search space to an MDP.


Accelerating Wide & Deep Recommender Inference on GPUs NVIDIA Developer Blog

#artificialintelligence

Recommendation systems drive engagement on many of the most popular online platforms. As the growth in the volume of data available to power these systems accelerates rapidly, data scientists are increasingly turning from more traditional machine learning methods to highly expressive deep learning models to improve the quality of their recommendations. Google's Wide & Deep architecture has emerged as a popular choice of model for these problems, both for its robustness to signal sparsity, as well as its user-friendly implementation in TensorFlow via the DNNLinearCombinedClassifier API. While the cost and latency induced by the complexity of these deep learning models can be initially very expensive for inference applications, we'll show that an accelerated, mixed-precision implementation of them optimized for NVIDIA GPUs can drastically reduce latency while obtaining impressive improvements in cost/inference. This paves the way for fast, low-cost, scalable recommendation systems well suited to both online and offline deployment and implemented using simple and familiar TensorFlow APIs. In this blog, we describe a highly optimized, GPU-accelerated inference implementation of the Wide & Deep architecture based on TensorFlow's DNNLinearCombinedClassifier API. The solution we propose allows for easy conversion from a trained TensorFlow Wide & Deep model to a mixed precision inference deployment. We also present performance results of this solution based on a representative dataset and show that GPU inference for Wide & Deep models can produce up to a 13x reduction in latency or a 11x throughput improvement in online and offline scenarios respectively. While we all likely have an intuitive understanding of what it is to make a recommendation, the question of how a machine learning model might make one is much less obvious. After all, there is something very prescriptive about the concept of a recommendation: "you should watch movie A", "you should eat the tagliatelle at restaurant B".


Accelerating Wide & Deep Recommender Inference on GPUs NVIDIA Developer Blog

#artificialintelligence

Recommendation systems drive engagement on many of the most popular online platforms. As the growth in the volume of data available to power these systems accelerates rapidly, data scientists are increasingly turning from more traditional machine learning methods to highly expressive deep learning models to improve the quality of their recommendations. Google's Wide & Deep architecture has emerged as a popular choice of model for these problems, both for its robustness to signal sparsity, as well as its user-friendly implementation in TensorFlow via the DNNLinearCombinedClassifier API. While the cost and latency induced by the complexity of these deep learning models can be initially very expensive for inference applications, we'll show that an accelerated, mixed-precision implementation of them optimized for NVIDIA GPUs can drastically reduce latency while obtaining impressive improvements in cost/inference. This paves the way for fast, low-cost, scalable recommendation systems well suited to both online and offline deployment and implemented using simple and familiar TensorFlow APIs. In this blog, we describe a highly optimized, GPU-accelerated inference implementation of the Wide & Deep architecture based on TensorFlow's DNNLinearCombinedClassifier API. The solution we propose allows for easy conversion from a trained TensorFlow Wide & Deep model to a mixed precision inference deployment. We also present performance results of this solution based on a representative dataset and show that GPU inference for Wide & Deep models can produce up to a 13x reduction in latency or a 11x throughput improvement in online and offline scenarios respectively. While we all likely have an intuitive understanding of what it is to make a recommendation, the question of how a machine learning model might make one is much less obvious. After all, there is something very prescriptive about the concept of a recommendation: "you should watch movie A", "you should eat the tagliatelle at restaurant B".


Integrating Acting, Planning and Learning in Hierarchical Operational Models

Patra, Sunandita, Mason, James, Kumar, Amit, Ghallab, Malik, Traverso, Paolo, Nau, Dana

arXiv.org Artificial Intelligence

We present new planning and learning algorithms for RAE, the Refinement Acting Engine. RAE uses hierarchical operational models to perform tasks in dynamically changing environments. Our planning procedure, UPOM, does a UCT-like search in the space of operational models in order to find a near-optimal method to use for the task and context at hand. Our learning strategies acquire, from online acting experiences and/or simulated planning results, a mapping from decision contexts to method instances as well as a heuristic function to guide UPOM. Our experimental results show that UPOM and our learning strategies significantly improve RAE's performance in four test domains using two different metrics: efficiency and success ratio.